Skip to content

feat: migrating to validating admission policies for our validating logic (1/)#708

Open
michaelawyu (michaelawyu) wants to merge 4 commits into
kubefleet-dev:mainfrom
michaelawyu:feat/migrate-to-vap-1
Open

feat: migrating to validating admission policies for our validating logic (1/)#708
michaelawyu (michaelawyu) wants to merge 4 commits into
kubefleet-dev:mainfrom
michaelawyu:feat/migrate-to-vap-1

Conversation

@michaelawyu
Copy link
Copy Markdown
Member

Description of your changes

This PR is the first of many following PRs to migrate to validating admission policies for applicable validation logic.

Specifically, this PR:

  • adds a admission policy manager for managing admission policies when the agent starts up. This manager is best used for environments where policies cannot be easily deployed/refreshed as part of the agent Helm charts.
  • adds two policy generator: one for denying writes to pods + replica sets in non-reserved namespaces, the other one for denying writes to service accounts and creating token requests in reserved namespaces.

Releated to #707.

I have:

  • [*] Associated this change with a known KubeFleet Issue (Bug, Feature, etc).
  • [*] Run make reviewable to ensure this PR is ready for review.

How has this code been tested

  • [*] Integration tests

Special notes for your reviewer

Signed-off-by: michaelawyu <chenyu1@microsoft.com>
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
@michaelawyu
Copy link
Copy Markdown
Member Author

Please note that for this PR the manager is not enabled in the hub agent yet.

To control the PR size, additional tests will be submitted as a separate PR.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

❌ Patch coverage is 66.35514% with 108 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pkg/admissionpolicymanager/manager.go 46.42% 68 Missing and 22 partials ⚠️
...kg/admissionpolicymanager/svcaccountsntokenreqs.go 83.52% 8 Missing and 6 partials ⚠️
pkg/admissionpolicymanager/podsnreplicasets.go 93.93% 2 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces the first slice of work to move KubeFleet's validation logic from validating webhooks to Kubernetes ValidatingAdmissionPolicies (VAP). It adds a startup-time policy manager that reconciles a configurable set of generated VAPs and their bindings, plus two initial generators (deny pod/replicaset writes outside reserved namespaces, deny serviceaccount/token writes in reserved namespaces). It also exports the namespace prefix constants from pkg/utils.

Changes:

  • New pkg/admissionpolicymanager package: manager that create-or-updates and garbage-collects VAPs labeled by Fleet, with two generators (pods/replicasets, serviceaccounts/tokenrequests).
  • Integration test suite (envtest) validating the manager produces the expected VAP/VAPB objects (effects deferred to E2E).
  • Exports KubePrefix, FleetPrefix, FleetMemberNamespacePrefix from pkg/utils/common.go for reuse in the generator configs.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
pkg/utils/common.go Exports previously unexported namespace prefix constants so the admission policy generator configs can reuse them.
pkg/admissionpolicymanager/manager.go Core policy manager: reflection-based generator selection, create-or-update with retry, GC of orphan policies/bindings via label selector.
pkg/admissionpolicymanager/configs.go Top-level configuration struct exposing per-generator config and a default.
pkg/admissionpolicymanager/podsnreplicasets.go Generator producing a VAP that denies CREATE of pods/replicasets outside reserved namespaces.
pkg/admissionpolicymanager/svcaccountsntokenreqs.go Generator producing a VAP that denies serviceaccount CRUD and tokenrequest CREATE in reserved namespaces, with user/group whitelist.
pkg/admissionpolicymanager/suite_test.go Ginkgo/envtest suite bootstrap; starts the policy manager once.
pkg/admissionpolicymanager/manager_integration_test.go Asserts the expected VAP and VAPB objects exist with correct spec/labels.

Comment thread pkg/admissionpolicymanager/svcaccountsntokenreqs.go
Comment thread pkg/admissionpolicymanager/svcaccountsntokenreqs.go
Comment thread pkg/admissionpolicymanager/podsnreplicasets.go
Comment thread pkg/admissionpolicymanager/podsnreplicasets.go Outdated
Comment thread pkg/admissionpolicymanager/manager.go
Comment thread pkg/utils/common.go Outdated
Comment on lines +102 to +132
// Exempt whitelisted users from this admission policy.
for _, username := range g.WhitelistedUsernames {
celExprAccSegs = append(celExprAccSegs, fmt.Sprintf(`request.userInfo.username == "%s"`, username))
}
// Exempt whitelisted user groups from this admission policy.
for _, userGroup := range g.WhitelistedUserGroups {
celExprAccSegs = append(celExprAccSegs, fmt.Sprintf(`"%s" in request.userInfo.groups`, userGroup))
}
// Exempt requests from the Kubernetes scheduler, any of the nodes, and (esp.) the
// Kubernetes controller manager from this admission policy.
//
// Important: the Kubernetes controller manager, when deployed with the option
// --use-service-account-credentials=true, creates a service account token for many of its controllers
// and uses those tokens to authenticate to the Kubernetes API server. It retrieves a token
// via the TokenRequest API; failure to exempt this scenario may lead to critical errors.
celExprAccSegs = append(celExprAccSegs, fmt.Sprintf(`request.userInfo.username == "%s"`, kubeSchedulerUserName))
celExprAccSegs = append(celExprAccSegs, fmt.Sprintf(`request.userInfo.username == "%s"`, kubeControllerManagerUserName))
celExprAccSegs = append(celExprAccSegs, fmt.Sprintf(`"%s" in request.userInfo.groups`, kubeNodeUserGroup))
// Exempt requests from admin users from this admission policy.
celExprAccSegs = append(celExprAccSegs, fmt.Sprintf(`"%s" in request.userInfo.groups`, adminUserGroup))

celExprAcc := strings.Join(celExprAccSegs, " || ")

celExprNSSegs := []string{}
for _, prefix := range g.ReservedNamespacePrefixes {
celExprNSSegs = append(celExprNSSegs, fmt.Sprintf(`object.metadata.namespace.startsWith("%s")`, prefix))
}
celExprNS := strings.Join(celExprNSSegs, " || ")

celExpr := fmt.Sprintf("!(%s) || (%s)", celExprNS, celExprAcc)

Copy link
Copy Markdown
Member Author

@michaelawyu michaelawyu (michaelawyu) May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this restriction is added in consistency with our existing webhook setup. system:masters is supposed to have unlimited access to all resources BTW.

Comment on lines +119 to +121
// The error message has been set to match with the checks in some of the E2E tests
// in our existing release pipeline.
Message: "creating pods and replicas is disallowed in the fleet hub cluster",
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N/A: as mentioned, the error msg is explicitly kept in the same style as some of the existing checks for releasing purposes.

Comment thread pkg/admissionpolicymanager/podsnreplicasets.go Outdated
Comment thread pkg/admissionpolicymanager/manager_integration_test.go
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
Signed-off-by: michaelawyu <chenyu1@microsoft.com>

// AllGenerators returns a copy of all available policy generators.
func AllGenerators() sets.Set[string] {
return allGenerators.Clone()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you get all generator names by reflecting over DefaultPolicyGeneratorConfigs the same way preparePolicyGenerators do?

Keeping them in sync can prevent people adding new ValidatingAdmissionPolicyGenerator but forget to add to one of allGenerators or DefaultPolicyGeneratorConfigs

)

// reservedNamespacePrefixRegexp matches valid namespace prefix characters (DNS label subset).
var reservedNamespacePrefixRegexp = regexp.MustCompile(`^[a-z0-9-]+$`)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in common.go (or another common file that doesn't belong to any specific VAP Generator) because it is used by both podsnreplicasets.go and svcaccountsntokenreqs.go?

type ValidatingAdmissionPolicyGenerator interface {
Name() string
Validate() error
Policies() []*admissionregistrationv1.ValidatingAdmissionPolicy
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of independent Policies and PolicyBindings members, can we do

type PolicyWithBindings struct {
    Policy   admissionregistrationv1.ValidatingAdmissionPolicy
    Bindings []admissionregistrationv1.ValidatingAdmissionPolicyBinding
}

type ValidatingAdmissionPolicyGenerator interface {
	Name() string
	Validate() error
    PoliciesWithBindings() []PolicyWithBindings
}

?

The concern is that right now the relationship between Policy and Bindings is not explicit and is more error-proned (e.g. someone created a Policy but no Binding; someone deleted a Policy but forgot to delete the corresponding Bindings)

if policy.Labels == nil {
policy.Labels = make(map[string]string)
}
policy.Labels[VAPManagedByKubeFleetLabelKey] = VAPManagedByKubeFleetLabelValue
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you define a map like this on top of file

var AdmissionPolicyManagerLabels = map[string]string{
    VAPManagedByKubeFleetLabelKey:  VAPManagedByKubeFleetLabelValue,
    VAPPartOfKubeFleetLabelKey:     VAPPartOfKubeFleetLabelValue,
    VAPComponentKubeFleetLabelKey:  VAPComponentAdmissionPolicyManagerLabelValue,
}

this can be replaced with

maps.Copy(policy.Labels, AdmissionPolicyManagerLabels)

can do similar things to 2 other places in this file since these 3 labels and values are always used together for all of AdmissionPolicyManager usages

if policyBinding.Labels == nil {
policyBinding.Labels = make(map[string]string)
}
policyBinding.Labels[VAPManagedByKubeFleetLabelKey] = VAPManagedByKubeFleetLabelValue
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you define a map like this on top of file

var AdmissionPolicyManagerLabels = map[string]string{
    VAPManagedByKubeFleetLabelKey:  VAPManagedByKubeFleetLabelValue,
    VAPPartOfKubeFleetLabelKey:     VAPPartOfKubeFleetLabelValue,
    VAPComponentKubeFleetLabelKey:  VAPComponentAdmissionPolicyManagerLabelValue,
}

this can be replaced with

maps.Copy(policyBinding.Labels, AdmissionPolicyManagerLabels)

err := retry.OnError(policyRWOpBackoff, func(err error) bool {
// Retry on any error expect for context cancellation. Note that nil errors are not passed to this function,
// and AlreadyExists errors will not occur.
if ctx.Err() != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func retryOnAnyErrorUnlessCancelled(ctx context.Context) func(error) bool {
    return func(err error) bool {
        return ctx.Err() == nil
    }
}

and then replace 6 of these usages with

retry.OnError(policyRWOpBackoff, retryOnAnyErrorUnlessCancelled(ctx), func() error {
    // ...
})

?

for _, gen := range m.enabledPolicyGenerators {
// As a sanity check, do one more round of validation.
//
// Normally this check would never fail as the generators have been validated before
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems to be suggesting that in manager initialization (i.e. New method), the Validate methods were called. But they were not called in New?

Rule: admissionregistrationv1.Rule{
APIGroups: []string{""},
APIVersions: []string{"v1"},
Resources: []string{"pods"},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have constants for "pods" and "replicasets" in webhook.go. Shall we start sharing them across webhook package and this new package by pulling them out into a common package? And eventually webhook.go will be deleted in one PR together with other webhook logic? Want to make sure that we still have these as constants after replacing webhook by VAP.

// Check if any whitelisted username or user group contains characters that are
// illegal in a CEL string literal (\ and ").
for _, username := range g.WhitelistedUsernames {
if strings.ContainsAny(username, `"\`) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 illegal characters are applicable to all CELs, so a similar check needs to be done for many more future VAP generators? Maybe extract into a function so that it is more discoverable for future generators?

// ValidateCELStringLiterals checks that none of the values contain characters
// that would break or inject into a CEL double-quoted string literal.
// The fieldName is used in the error message for context.
func ValidateCELStringLiterals(values []string, fieldName string) error {
    for _, v := range values {
        if strings.ContainsAny(v, `"\`) {
            return errors.NewUserError(nil,
                fmt.Sprintf("%s contains characters illegal in a CEL string literal", fieldName),
                fieldName, v)
        }
    }
    return nil
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these 2 become:

    if err := ValidateCELStringLiterals(g.WhitelistedUsernames, "whitelisted username"); err != nil {
        return err
    }
    if err := ValidateCELStringLiterals(g.WhitelistedUserGroups, "whitelisted user group"); err != nil {
        return err
    }


// Exempt whitelisted users from this admission policy.
for _, username := range g.WhitelistedUsernames {
celExprAccSegs = append(celExprAccSegs, fmt.Sprintf(`request.userInfo.username == "%s"`, username))
Copy link
Copy Markdown
Member

@weng271190436 Wei Weng (weng271190436) May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a concern that this CEL construction code is not very readable. Might be difficult for someone to spot a mistake if there is any.

Wondering if it is worth investing in a celbuilder.go like this?

// CELExpr represents a CEL expression fragment.
type CELExpr string

// String returns the CEL expression as a string.
func (e CELExpr) String() string {
    return string(e)
}

// UsernameEquals returns a CEL expression matching a specific username.
func UsernameEquals(username string) CELExpr {
    return CELExpr(fmt.Sprintf(`request.userInfo.username == "%s"`, username))
}

// InGroup returns a CEL expression matching membership in a group.
func InGroup(group string) CELExpr {
    return CELExpr(fmt.Sprintf(`"%s" in request.userInfo.groups`, group))
}

// NamespaceStartsWith returns a CEL expression matching a namespace prefix.
func NamespaceStartsWith(prefix string) CELExpr {
    return CELExpr(fmt.Sprintf(`request.namespace.startsWith("%s")`, prefix))
}

// Or joins expressions with ||.
func Or(exprs ...CELExpr) CELExpr {
    parts := make([]string, len(exprs))
    for i, e := range exprs {
        parts[i] = string(e)
    }
    return CELExpr(strings.Join(parts, " || "))
}

// Not negates an expression.
func Not(expr CELExpr) CELExpr {
    return CELExpr(fmt.Sprintf("!(%s)", expr))
}

then we can build and use CEL like this:

func (g *SvcAccountsAndTokenRequestsVAPGenerator) Policies() []*admissionregistrationv1.ValidatingAdmissionPolicy {
    // Namespace condition
    var nsClauses []CELExpr
    for _, prefix := range g.ReservedNamespacePrefixes {
        nsClauses = append(nsClauses, NamespaceStartsWith(prefix))
    }

    // User exemptions
    var exemptions []CELExpr
    for _, u := range g.WhitelistedUsernames {
        exemptions = append(exemptions, UsernameEquals(u))
    }
    for _, grp := range g.WhitelistedUserGroups {
        exemptions = append(exemptions, InGroup(grp))
    }
    exemptions = append(exemptions,
        UsernameEquals(kubeSchedulerUserName),
        UsernameEquals(kubeControllerManagerUserName),
        InGroup(kubeNodeUserGroup),
        InGroup(adminUserGroup),
    )

    // Allow if: not reserved namespace OR user is exempt
    celExpr := Or(Not(Or(nsClauses...)), Or(exemptions...))

    policy := &admissionregistrationv1.ValidatingAdmissionPolicy{
        // ...
        Spec: admissionregistrationv1.ValidatingAdmissionPolicySpec{
            Validations: []admissionregistrationv1.Validation{
                {
                    Expression: celExpr.String(),
                    Message:    "writing service accounts in reserved namespaces or requesting tokens from such service accounts is disallowed",
                    Reason:     ptr.To(metav1.StatusReasonForbidden),
                },
            },
        },
    }
    return []*admissionregistrationv1.ValidatingAdmissionPolicy{policy}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

podsnreplicasets.go can also do similar:

func (g *PodsAndReplicaSetsVAPGenerator) Policies() []*admissionregistrationv1.ValidatingAdmissionPolicy {
    var nsClauses []CELExpr
    for _, prefix := range g.ReservedNamespacePrefixes {
        nsClauses = append(nsClauses, NamespaceStartsWith(prefix))
    }

    celExpr := Not(Or(nsClauses...))

    policy := &admissionregistrationv1.ValidatingAdmissionPolicy{
        // ...
        Spec: admissionregistrationv1.ValidatingAdmissionPolicySpec{
            Validations: []admissionregistrationv1.Validation{
                {
                    Expression: celExpr.String(),
                    Message:    "creating pods and replicas is disallowed in the fleet hub cluster",
                    Reason:     ptr.To(metav1.StatusReasonForbidden),
                },
            },
        },
    }
    return []*admissionregistrationv1.ValidatingAdmissionPolicy{policy}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants